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In the analysis of cluster data, the regression coefficients are fre- 
quently assumed to be the same across all clusters. This hampers 
the ability to study the varying impacts of factors on each cluster. 
In this paper, a semiparametric model is introduced to account for 
varying impacts of factors over clusters by using cluster-level co- 
variates. It achieves the parsimony of parametrization and allows 
the explorations of nonlinear interactions. The random effect in the 
semiparametric model also accounts for within-cluster correlation. 
Local, linear-based estimation procedure is proposed for estimating 
functional coefficients, residual variance and within-cluster correla- 
tion matrix. The asymptotic properties of the proposed estimators 
are established, and the method for constructing simultaneous confi- 
dence bands are proposed and studied. In addition, relevant hypoth- 
esis testing problems are addressed. Simulation studies are carried 
out to demonstrate the methodological power of the proposed meth- 
ods in the finite sample. The proposed model and methods are used 
to analyse the second birth interval in Bangladesh, leading to some 
interesting findings. 

1. Introduction. 

1.1. Preamble. Longitudinal data analysis has attracted considerable at- 
tention in the literature. For longitudinal data, the data from the same clus- 
ter are dependent with each other. As far as modeling is concerned, this 
within-cluster dependency is usually accounted by random cluster effects 
and modeled by a within-cluster correlation matrix. The within-cluster cor- 
relation matrix plays a very important role in longitudinal data analysis, as 
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it can be used to improve the efficiency of the estimation. Actually, most of 
the existing literature is devoted to addressing how to make use of within- 
cluster correlation matrix to improve the estimation for unknown parameters 
or functions. 

The methodology for parametric based longitudinal data analysis is quite 
mature (see, e.g., Diggle, Heagerty, Liang and Zeger [6] and the references 
therein). The situation with nonparametric based longitudinal data anal- 
ysis is very different. One of the main difficulties is how to incorporate 
the within cluster correlation structure into the estimation procedure. Lin 
and Carroll [18] recommend that we ignore the within-cluster correlation 
when kernel smoothing is employed. Welsh, Lin and Carroll [27] investigate 
the possibility of using weighted least squares based on the within-cluster 
correlation structure when spline smoothing is used. They suggest that the 
weighted least squares estimator based on the true within-cluster correlation 
structure works better than the estimator based on working independence 
when spline smoothing is used. Other literature about nonparametric lon- 
gitudinal regression includes Zeger and Diggle [31], Brumback and Rice [2], 
Hoover et al. [14], Wu et al. [28], Martinussen and Scheike [21], Chiang et 
al. [3], Huang et al. [15], Wang [25], Fan and Li [8], Chiou and Miiller [4], 
Wang, Carroll and Lin [26], Qu and Li [23], Lin and Carroll [19], Sun et 
al. [24] and Fan and Wu [11], among others. 

Most of the literature assumes that the regression parameters or functions 
are the same across all clusters. However, when the regression effects of some 
particular clusters are of interest, it is unreasonable to assume the regression 
parameters or functions are the same across all clusters. The interactions of 
regression effects with clusters are of interest. A naive method to address 
this is to let the regression coefficients or functions vary freely over clusters. 
However, this naive method will not be parsimonious, particularly when the 
number of clusters is large and the issue of estimability arises. In addition, 
the within-cluster dependency may be addressed by the random effect. This 
leads us to model these cluster-dependent regression coefficients or functions 
by using cluster level variables. It addresses, simultaneously, the parsimony 
and cluster dependency of modeling. 

1.2. A motivating example. The data that stimulates this project is from 
Bangladesh and concerns the second birth interval, which is defined as the 
duration between the first birth and the second birth. The data comes from 
the Bangladesh Demographic and Health Survey (BDHS) of 1996-1997 (Mi- 
tra et al. [22]), which is a cross-sectional, nationally representative survey 
of ever-married women aged between 10 and 49. Of interest is how some 
factors that are commonly found to be associated with contraceptive use in 
Bangladesh, such as education and religion, affect the second birth interval. 
The data were collected from different districts (clusters) in the six different 
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divisions of Bangladesh. Bangladesh is divided into six administrative divi- 
sions: Barisal, Chittagong, Dhaka, Kulna, Rajshahi and Sylhet. The data 
from the same cluster are correlated with each other, due to cluster-level 
factors such as cultural norms and access to family planning programs. Of 
particularly interest is how the factors affect the second birth interval in 
some particular clusters. For example, how these factors affect the second 
birth interval in a rural area in Chittagong division. 

Some of the factors of interest are defined on the individual level, such 
as education, and are called individual level variables. Some of the factors 
are defined on the cluster level, such as type of region of residence, and are 
called cluster-level variables. We use yij to denote the length of the second 
birth interval of the j'th woman in the iih cluster, Xij to denote the vector 
of the corresponding individual level variables and Z{ to denote the vector 
of the cluster level variables. 

Frequently, the linear model 



would be used to fit the data. The within-cluster dependency would be 
accounted by e^, j = l,...,m being correlated. The covariance matrix of 
£i = (^iii • • • ,£irii) T can be incorporated into the estimation procedure. 

Model (1.1) would be fine if the interest focuses only on the global impact 
of the factors. However, if the picture for a particular cluster is of interest, 

(1.1) would not be adequate. Let's take education as an example. It is evident 
that the impact of education in the cluster where Muslims predominate 
would be different from the cluster where Hindus predominate. To take the 
difference of this kind into account, we may relax the assumption imposed 
on (1.1) and allow the regression coefficients to vary over clusters. This leads 
to 

(1.2) yij = Xj j a. i + Zf(3 + Eij, j = 1,. . . ,rii,i = l,. . . ,m. 

While it accounts for the varying impact across clusters, (1.2) is not parsi- 
monious. In fact, (1.2) involves pm + q regression coefficients, where p and q 
are the dimensions of and Zj, respectively. When the number of clusters 
m is large, there would be too many unknown parameters in model (1.2) 
for us to get reasonably accurate estimators. In longitudinal data analysis, 
we often come across large number of clusters. For example, there are 296 
clusters in the second birth interval data set that stimulates this paper. If 
we use (1.2) to fit the data, we would face 296p + q unknown coefficients, 
and would certainly pay a big price on variances of the resulting estimators. 

A sensible approach is to model the factor loadings aj by using cluster- 
level variables. A reasonable model is 



(1.1) 



Vij =X? j aL + Zlf3 + e i j, 



j = l,...,m,i = l,... 



m 



(1.3) 




4 



W. ZHANG, J. FAN AND Y. SUN 



where A = (qi, . . . , at q ), and (i = 1, . . . , m) are random effects with mean 
zero. This achieves, simultaneously, the parsimony and within-cluster de- 
pendency, and the cluster-dependent factor loadings are allowed. In fact, 
the number of unknown coefficients in model (1.3) is p(q + 1) + q, which is 
usually much smaller than pm + q. 

A further extension of model (1.3) is to let the factor loadings vary with 
time as the society and technology evolve with time. By allowing the impacts 
varying with time, we come up with the model 



where Uj is time, A(Uij) = (cxi(Uij), . . . ,a q (Uij)). Model (1.4) is the model 
that we are going to address. It is a kind of varying coefficient model (Xia 
and Li [30], Fan and Zhang [9, 10], Zhang et al. [32] and Li and Liang [17]). 
To make (1.4) more mathematically clear and general, from now on, Uu 
is not necessarily to be time, and it can be any continuous covariate. This 
allows the nonlinear interaction of individual variables X^ and cluster level 
variable Zi with Uij. We assume that e%j is measurement error with mean 
and variance a 2 and independent of Xij, Uij and Zi, and that {ej} are i.i.d. 
random effects with mean px i and covariance matrix E and independent 
of all other random variables. We assume that {{Xjj,Uij) T } are i.i.d., and 
so are {Zi}. 

In (1.4), /3(-), ctk('), k = 0, . . . , q, are unknown functions to be estimated, 
and so are a 2 and E. Although (1.4) is stimulated by the second birth interval 
data, the modeling concept and estimation methodology, which this paper 
aims to explore, are equally applicable to other kinds of data, such as the 
data obtained from medicine and engineering. 

The paper is organized as follows. We begin, in Section 2, with a descrip- 
tion of the estimation procedure for the proposed model (1.4). In Section 
3, we establish the asymptotic properties of the proposed estimators. Hy- 
pothesis test associated with model (1.4) is discussed in Section 4. The per- 
formance of the method is assessed by a simulation study in Section 5. In 
Section 6, we use the proposed model and estimation procedure to analyse 
the data on the second birth intervals in Bangladesh and explore how the 
impacts of the factors of interest on the length of second birth intervals in 
some particular clusters change over time. 

2. Estimation procedure. In this section, we are going to construct the 
estimation procedure for the proposed model (1.4). We estimate the un- 
known functional coefficients first, then a 2 and E. 



(1.4) 



{ 



Vi j = X/jmiUj) + Zf(3{Uij) + eij, 
*i(Uij) = a Q (Uij) + A{Uij)Zi + e h 
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2.1. Estimation of functional coefficients. By Taylor's expansion, we have 

c*-k{Uij) ^OL k (u) +a. k (u){Uij -u), k = 0,...,q, 

(3{U ij )^f3{u) + f3{u){U l3 -u) 1 

when Uij is in a small neighborhood of u, which leads to the local least 
squares estimation procedure 

^{b fc + c k (Uij - u)}z ik 

.k=0 

-Zftf + diUij-u)}] K h (U i3 -u), 

where ziq = 1, Kh(-) = K(-/h)/h, h is a bandwidth, and K(-) is a kernel 
function. We minimize L, with respect to J, d, b^, c k , k = 0, . . . ,q, to get 
the minimizer J, d, h k , c k , k = 0, . . . ,q. We use b^ to estimate ot k (u) and 
J to estimate (3(u). From now on, we denote h k by a k (u) and J by (3(u). 
Let 

Xj = (X a , ■ ■ .,X mi f, Ti = ( Xi {(l, Zf) ® I p }, l ni ® Zf ), 

r = (rf,...,r^) T , x = (r,^r), 

= diag(({7n - u)\ . . . , (J7i ni - u)\ (U mX - u)\ (U mnm - it)*), 
W = diag(Jf ft (t/n - it), ... , K h (U lni -«),..., 
K h {U m i -«),.. .,K h (U mnm - u)), 

Y — (?/ll > • • • ; > • ■ • j 2/ml j • • ■ j Dmn m ) j 

where I p is s size p identity matrix and 1^ is a <i dimensional vector with 
each component being 1. By a simple calculation, we have 

(2.2) a k (u) = A k Y (k = 0, . . . , q), 0(u) = BY, 

where 

A k = (eJ k+lUq+1) ® / p ,0 px(9+s) )(X T ^X)- 1 X T ^, 
B = (0 gx ((9+1)p) , /„ ?Xs ) (X T WX)-W 

with e k „ denoting the unit vector of length p with 1 at position k, pX q the 
size p x q matrix with all entries and s = (q+ l)p + q. 

In practice, some coefficients in (1.4) are constant. Under such a situa- 
tion, model (1.4) becomes a semivarying coefficient mixed effects model. An 
interesting question is how to estimate the constant coefficients. Fan and 
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Zhang [9] studied a varying coefficient model with coefficients having differ- 
ent degrees of smoothness. They proposed a two-stage estimation procedure. 
Based on their idea, we propose a very simple estimation procedure for the 
unknown constant coefficients. Suppose that the jth component ctkj(u) of 
otk(u) is a constant that is denoted by Ckj- We first pretend that ctkj(u) 
is a function and get the estimator af~j(Uu) of ctkj(u) at Uu, i = l, ...,n», 
i = 1, . .. ,m, by the above estimation procedure. Then, take the average of 
&kj(Uu) over I = 1, . . . ,rii, i = 1, . . . ,m. This average 

m rii m 

(2.3) Ciy = -^2^2&kj(Ua), n = ^2,n l 

n i=l 1=1 i=l 

is our estimator of Cfcj. We will show, later, that the convergence rate of this 
estimator is of order Op(n -1 / 2 ), when the bandwidth is properly selected. 
This provides a simple method for estimating the constant coefficients. 

A more efficient estimate for the constant coefficient can be obtained 
by using the profile likelihood method (see, e.g., Lam and Fan [16]). For 
simplicity, we do not pursue this further. 

2.2. Estimation of a 2 and S. Let §Li(Uij) = ao(^y) + Efe=i a k(Uij)zik 
and r-j = (ra,...,n ni ) T where nj = y\j - Xfj&^Uij) - Zjf3(Uij). Corre- 
spondingly, let aj(-) and r» be their substitution estimators. Set 

Xj = (Xn, . . . ,X in J T and Pj = x;(xf Xi) _1 xf . 

For each given z, we have the linear model 

(2.4) ri = Xjei + ei, £i = (£ il ,...,e ini ) T . 
The residual sum of squares of this linear model 

rssj = rj (1^ -Pi)r» 

would be the raw material for estimating a 2 . The degree of freedom of rssj is 
Hi — p. Let RSSj be rssj with r-j replaced by r-j. Pooling all {RSSj} together 
leads to the estimator of a 2 as 

m m 

a 2 = (n — mp)" 1 RSSj, n = n,,. 

i=l i=l 

Finally, we estimate S. From (2.4), we have the least squares estimator 
of &i as 



6j — (Xj Xj) Xj Fj — &i + (Xj Xj) Xj £j, 
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which leads to 



i=l i=l i=l i=l 

m 

+ H e * £ i nX *( X ^ X *)~ 1 - 
i=l 

The last two terms are of order Op(m 1//2 ), so they are negligible. Hence, 

m ( m m ~\ 



m 

i=l ki=l i=l 



U=i i=i 

1 E^-^E^r 1 ■ 



i 



m 



i=l 



Therefore, we have 

m m 

(2.5) S = m- 1 Ee i ef-^~ 1 ^ 2 E( x r x 0" 1 

i=l i=l 

to estimate E. In (2.5), ej is e$ with rj replaced by ?j. 



3. Asymptotic properties. In this section, we are going to present the 
asymptotic properties of the proposed estimators. For any p x p symmetric 
matrix A, we use vech(.A) to denote the vector consisting of all elements 
on and below the diagonal of the matrix A, and we use vec(^4) to denote 
the vector by simply stacking the column vectors of matrix A below one 
another. Obviously, there exists a unique p 2 x p(p + l)/2 matrix R p such 
that vec(.A) = R p vech(A). 

We first introduce some notation. For any function or function vector 
<?(•), we use <?(•) and g(-) to denote its first and second derivatives, respec- 
tively. We use T> to denote the collections of all individual and cluster level 
covariates. Let fii = J t % K(t)dt, = j t l K 2 {t) dt, and let 

fi(ti) = E{(X T , Z T (g> X T , Z T ) T (X T , Z T <g) X T , Z T )\U = u} 

( ni{u) n 2 (u)\ 

where £li(u) and f^-) are, respectively, p{q + 1) x p(q + 1) and q x q sub- 
matrix of Without loss of generality, we will assume that fio = 1. 

Our main asymptotic results are presented through the following 6 theo- 
rems. We leave the proofs of these theorems to the Appendix. The first two 
theorems give the asymptotic normality of the estimated functional coeffi- 
cients. 
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Theorem 1. Under the conditions (1)— (6) in the Appendix, when nh 5 
is bounded, for k = 0, . . . , q, we have 



\Jnhf(u)\ a k {u) - a k (u) - h 2 ^-a k (u) 



— > iVp(O px i, 

Mefk+iUq+i) ® ip}k 2 Ai(u)- 1 + ©i(u)]{e(*+i),(,+i) ® I P }), 

where 

Ai(«) = - ^ 2 (u)^3(«) _1 ^2(tt) T , ©iH = Ti(«)S 1 (u)Ti(«) T , 

Ti(«) = (AiCt*)- 1 , A 2 (u)), A 2 (u) = -A^iO^fiaMW 1 
and 

Hi(u) = E{X T ZX(X T , Z T <g) X T , Z T ) T {X T , Z T ® X T , Z T )|C7 = u}. 

Theorem 2. Under the conditions (l)-(6) m Appendix, when nh 5 is 
bounded, we have 



nhf(u)i/3(u) - (5{u) - /^/3(n) j iV 9 (O gx i^oKA 3 (u) + 2 («)]), 

w/iere A 3 (u) = ft^n)" 1 + n 3 ( U y 1 n 2 (u) T ^Ai(u)- 1 2 (u)0 3 (u)- 1 , G 2 (u) = 
T 2 (u)Ei(u)T 2 (u) T and T 2 (u) = (A 2 (n) T , A 3 (n)). 

We now present the asymptotic normality for the parametric component. 
To present the asymptotic property of a 2 , we assume that the following limits 
exist and are finite: c\ = linv^oo n/(n — mp) , c 2 = linim^oo m/(n — mp) and 

m rii 

7 = plim(n - mp)' 1 £ £[Xg(xf Xi)" 1 ^] 2 , 

n— »oo . 

i=lj=l 

where "plim" denotes convergence in probability. 

Theorem 3. Under the conditions (l)-(7) in the Appendix, when nh s — ► 
0, we /jaue 

v^a 2 - a 2 } N(0, 2a 4 ci(ci - 1 - 7 ) + var(e 2 1 )ci(2 - c x + 7 )). 

Theorem 3 suggests the estimator a 2 is of convergence rate Op(re -1 / 2 ), 
which is the optimal convergence rate of the parametric estimator. 
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Additional notation is needed for presenting the asymptotic normality of 
S. Write 



m 



Ai = plim m 1 ^][(xfxj 

i=l 
m 

A 2 = plim m- 1 ^[(xfx i )- 1 ®(xfx J ) _1 ], 



/£® A 1(1) + Ai®S (1 )' 



A; 



\ £ ® A X ( p ) + Ai ® , 
where A^), S( r ) (r = 1, . . . ,p) denote the rth row of Ax, S, respectively. Let 



A 4 = plim m- 1 £g[vec((xfx i )- 1 X^x5(xfx i )- 1 ) 

i=l 3=1 

xvec((xfx i )- 1 X J X^(xfx l )- 1 ) T ], 
A 5 = plim mT x ^[(xfx^-^f P iXi (xf Xi)" 1 ], 

where Pi is a diagonal matrix generated from the diagonal elements of Pi. 

Theorem 4. Under the conditions (1)— (7) in Appendix, when nh 8 — > 0, 
we have 

vWch(£ - S) iV r (O rxl) (l/ca +p)(R^R p r 1 R^AR p (R^R p r 1 ), 
where r = p{p + l)/2 and 

A = £7{(eiei ) ® (eief )} - vec(S) vec(£) T + 2cr 4 A 2 
+ cr 2 {S Ai + Ai ® £ + A 3 } 
+ [var(e^) -2cr 4 ]{A 4 - c 2 [2vec(Ai) vec(Ai) r 

- vec(Ai)vec(A 2 ) T - vec(A 2 )vec(Ai) T ]} 
+ {2a 4 ci(ci - 1 -7) +var(e? 1 )ci(2 -ci +7)}vec(A 1 )vec(A 1 ) T . 

Theorem 4 shows that the estimator £ also achieves the convergence rate 
of P (n-V2). 

Theorem 5. When otkj(u) is a constant, under the conditions (l)-(6) 
and (8) in the Appendix and nh 4 — > 0, we have 

V™{Ckj — Ckj} 

N(0, el^^^EiAxiU)- 1 ] + E^U)] + E 2 }e kp+M{q+1)p) ) , 
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where 



1 



m Tii rii 



H 2 = plim-^^ V {T^U^ElX^XirTnTllUuMT^Uirf}, 
with rfj = (Xfj, Zj <8> X^, Z 4 T ) 6em<7 the jth row of matrix I\. 



When the jth (_?' = 1, . . . ,q) component j3j{u) of (3(u) is a constant, The- 
orem 5 applies to its estimator as well, except that the variance is 



ej >q {a 2 E[A 3 (U)~ l ] + E[Q 2 (U)] +E 3 }e j>q , 



where 



E 3 = P iim-J2J2 E {?2(u a )E[xTzx ir r u rl\Uu,u ir ]r 2 (u ir ) T }. 



n— >oo 71 



i=l l=l r =l,r^l 



Theorem 6. Under the conditions (l)-(6), (8) and (9) in the Appendix, 
with K(t) having a compact support [— cq,cq] and h = n~ p , 1/5 < p < 1/3 for 
all u £ [a,b], we have 

p[{-2\og{h/{b-a)}) 1/2 

otkjiu) - a k j(u) - bias(d fc j(u)|£>) 



x sup 

^u£[a,b] 



[var{d fci («)|2?}]V2 
— > exp{— 2exp{— x}}, 

where 

cv n = (-2log{h/(b-a)}) 1/2 



Wn > < X 



+ 



1 



-2log{h/{b-a)}y/ 2 
if K(cq) 7^ and 

cv n = (-2log{h/(b-a)}) 1/2 + 



>g^^ + ^oglog{(b-a)/h} 



(-2log{h/(b-a)}y/ 



(k(t)) 2 dt 



if K(cq) = 0, K(t) is absolutely continuous and K 2 {t), (K(t)) 2 are integrable 
on (— oo, +oo). 

Remark. Theorem 6 gives the distribution of the maximum discrepancy 
between the estimated functional coefficient and the true coefficient. It is the 
basis for constructing the hypothesis test or confidence band. Theorem 6 also 
applies to the estimator of any component of 
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4. Confidence bands and hypothesis test. In this section, we will inves- 
tigate how to construct the confidence bands for the functional coefficients 
in model (1.4). 

For model (1.4), we often wish to know if an estimated functional co- 
efficient is significantly different from zero or if the estimated functional 
coefficient is really varying. More generally, we wish to test 

(4.1) H :a kj (u) = a (u) < — ► Hf. a kj (u) ^ a (u), 

where ao(u) is a specific function. This kind of nonparametric testing prob- 
lem can be conveniently handled by using the generalized likelihood ratio 
method (Fan, Zhang and Zhang [12]). Instead, our test statistics will be 
based on the constructed confidence bands. 

As the proposed confidence bands and test statistics involve the estima- 
tion of the biases and variances of the proposed estimators of the functional 
coefficients, we first construct the estimation procedure for the biases and 
variances. Throughout this paper, for any functional vector g(u), we use 
gy'(u) to denote the ith derivative of g(u). 

4.1. Estimation for bias and variance. Following Fan and Gijbels [7], by 

(2.2) , we have, for k = 0, . . . , q, that 




bias(a fc (u)|P) = (ef fe+1)j((?+1) ® / p ,O px(g+s) )(X T WX)- 1 X T iyR, 
Uas0(u)\V) = (0 qx{(q + 1)p) ,I q ,0 qxs )(X T WX)- 1 X T WK, 



where R= (Rn, Ri ni , • • • , R m i , • • • , Rmn m ) T with 

R ij = XjjA.iC,,) + ZfiPiUij) - (3(u) - f3(u)(U i3 - u)}, 
q 

Ai(Uij) = ^2{a k (Uij) - a k (u) - a k (u)(Uij - u)}z ik , z i0 = 1. 

By Taylor's expansion, we have 

OL k {Uij) - a k (u) - a k (u)(Uij - u) 

w 2- 1 a k (u)(U i3 - uf + ^afiu^Uij - uf, 
f3(U l3 )-f3(u)-f3(u)(U i3 -u) 

w 2- 1 '$(u)(U ij - uf + G-ip^iuXUij - uf. 

This leads to 

(4.3) r « (2- 1 w 2 r, 6- 1 W 3 r)(0(u) r , (3) (u) T f, 

where 6(u) = (cxq(u) t , . . . , ac p (u) T , (3(u) T ) T . The estimators of 6(u) and 
0( 3 ) (u) can be easily obtained by using local cubic fit with an appropriate 
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pilot bandwidth h*[= 0(n~ 1 / 7 )] in Section 2.1, which is optimal for esti- 

- (3) 

mating 6(u). We denote the estimators by 9(u) and (it). Substituting 
them into (4.3), we obtain R and, hence, estimated biases bias(ak{u)\T>) 
and bias | £>). 

We now derive an estimator of variance of 6t k (u) and (3(u). We notice 
that the estimators are linear in Y, according to (2.2). Hence, we need only 
to estimate var(Y). A natural estimator is 

vaf(F|P) = a 2 I n + diag(xiExf , . . . ,x m Sx^). 

This, together with (2.2), give us the estimators 

vaf(d fc (u)|2?) = A k v&r(Y\V)A k and vaS0(u)\V) = Bvar(Y|£>)B, 

with Afc and B defined in (2.2). 

4.2. Confidence bands and hypothesis testing. We first state the theorem, 
based on which the confidence bands and hypothesis tests are constructed. 

Theorem 7. Under the conditions (l)-(9) in the Appendix, with K{t) 
having a compact support [— cq,co] and h = n~ p , 1/5 < p < 1/3. Furthermore, 
if otfrj (•) is continuous on [a,b] and the pilot bandwidth h* is of order n~ 1 /"' ' , 
then, for all u £ [a, b], we have 

p((-21og{V(k-a)}) 1/2 

a k j(u) - a k j{u) - bias(a k j(u)\T>) 



x ^ sup 

J.£[a,b] 



Wn > < X 



[w{a fcj (u)|2?}]V2 
— > exp{— 2exp{— x}}, 
where uj n is exactly the same as that in Theorem 6. 

Remark. If the component of (3^\-) is continuous on [a, b], then The- 
orem 7 applies to its estimator as well. 

Based on Theorem 7, the 1 — a confidence bands of a k j(u) can be easily 
constructed as 

(a kj (u) - bias(a kj (u)\V) ± A 1>a (u)), 

where 

Ai, Q (n) = (u, n + peg 2 - log{- log(l - a)}](-21og{V(6 " «)})" 1/2 ) 
x {var^-^lP)} 1 / 2 . 
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It is worthwhile to mention that Xia [29] investigated bias-corrected confi- 
dence bands for univariate nonparametric regression. 

By Theorem 7, hypothesis (4.1) can be tested by using the test statistic 



(-21og{fc/(6-a)}) 



1/2 



sup 

uG[a,b] 



&kj( u ) — &o(u) — bias(dfcj(it)|P) 



var 



{a kj (u)\V}]^ 



UJr 



and rejecting Hq when the test statistic exceeds the asymptotic critical value 
c a = — log{— 0.5 log(l — a)}. Similarly, we may want to ask whether a specific 
functional coefficient is really varying. This amounts to testing the composite 
null hypothesis 



(4.4) 



H :a kj (u) =C; 



H x :a kj {u) ^C kj . 



Based on Theorems 5 and 7, we test the problem (4.4) by computing the 
statistic 



(-21og{ty(&-a)}) 



1/2 



sup 

u£[a,6] 



a kj {u) - C kj - bias{a k j(u)\T>) 



[v5r{d fcj (n)|P}]V2 
and rejecting Hq for large values of the test statistic. 



5. Simulation study. In this section, we are going to use a simulated 
example to demonstrate how well the proposed estimation method works 
and examine how much loss one could incur by ignoring the structure of 
aj(-) when the structure holds. We first demonstrate the accuracy of the 
proposed estimators. 

In model (1.4), we take p = 3, q = 2 and m = 100. The cluster sizes nj 
(i = 1, . . . ,m) are generated by the integer part of |2£| + 6, £ ~ N(0, 1). The 
covariants {Xij} are independently generated from N(0,I p ), {Zi} are inde- 
pendently generated from N(0,I q ) and {Uij} are independently generated 
from U(0, 1). We also set the random effect ej following the normal distribu- 
tion N(0 p , E), measurement error e$j following normal distribution iV(0, a 2 ) 
where £ = (<Jij) = 0.5 2 I p and a = 0.5. We set (3q(u) = sin(27ru) (intercept 
term), olq(u) = (aoi(")i a 02( - )) a 03(")) T a vector with each component being 
sin(27ru), cti(u) = (an(")i a i2( - ); a i3(')) T a vector with each component be- 
ing cos(2-7ru), Q2(m) = (a2i(-)> a 22(-)> Q 23( - )) T a vector with each component 
being sin(7ra), /3(u) = (/3i(-),/?2(-)) T a vector with each component being 
sin(27ru). 

For any function or functional vector g(-), if g(-) is an estimator of g(-), 
we define the mean integrated squared error (MISE) of g(-) as 



E 



\g(u) - g(u)\\ 2 du 



where ||b|| 2 = b T b and use it to assess the accuracy of the estimators. 
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The proposed estimation method is employed to estimate the functional 
coefficients. The kernel function is taken to be Epanechnikov kernel 0.75(1 — 
i 2 ) + and bandwidth is taken to be 0.15. The MISEs of the proposed esti- 
mators of unknown functions are presented in Table 1, based on 100 simu- 
lations. The MSEs of the estimators of S and a 2 are presented in Table 2. 
From Tables 1 and 2, we can see the proposed estimators are quite accurate. 

To visualize the accuracy of the proposed estimators, among the 100 sim- 
ulations, we single out the one with median performance and plot the es- 
timated functions together with their 95% confidence bands in Figure 1. It 
shows, again, that the proposed estimators are very accurate. 

Now, we turn to examine how much loss one could incur when ignoring the 
structure of aj(-), namely, examining the gain of our model (1.4). We now 
assume the cluster effects ej = 0. Given the sizes of the clusters in the above 
simulated example, many clusters would be too small for one to estimate the 
corresponding aj(-) when ignoring the structure of aj(-). So, we now assume 
all clusters share the same size of 50, and keep the number of clusters 100. 
Except the change on the sizes of the clusters and the assumption of e$ = 0, 
all remain the same as the above simulated example. 

We use MISEi^ to denote the MISE of the estimator of aj(-) obtained 
by the proposed estimation method and MISE2 j j to denote the MISE of the 
estimator obtained without using the structure of a^(-); that is, regarding 
aj(-) as a free unknown function and estimating it based on the first part of 
model (1.4). The ratio 



is used to assess the loss incurred on the estimation for a(-) = (ai(-), . . . , a m (-)) 
due to ignoring the structure of aj(-), i = 1, . . . , m. We compute the RMISEs 




Table 1 
The MISEs of the estimators 



Estimator 
MISE 



#>(■) 



«oi(0 
0.019 



&02{-) 

0.019 



acra(-) 
0.020 



ai2(0 
0.020 



0.013 



0.019 



Estimator 
MISE 



«13(-) 
0.019 



chi(-) 
0.016 



0.017 



023(0 

0.016 



&(0 



&(•) 



0.014 



0.014 



Table 2 
The MSEs of the estimators 



Estimator an 012 <5"i3 622 <5"23 033 a 2 

MSE 0.0055 0.0013 0.0013 0.0052 0.0015 0.0054 0.0029 
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Coefficient: |lo(u) Coefficient: «ot(u) Coefficient: a m (u) 




0.0 0.2 0.4 0.6 O.S 1.0 0.0 2 0.4 0.6 OH 1 0.0 0.2 0.4 0.6 O.B 10 



Coefficient: Uoj{u ) Coefficient: a, , ( u } Coefficient: a, 




0.0 0.2 0.4 0.8 0.8 1,0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.8 0.8 1.0 



Coefficient: a, 3 {u) Coefficient: a 2t (u) Coefficient: a I2 (ij) 




"i 1 1 1 1 r 1 S 1 1 1 1 r 1 L i 1 1 1 1 r 1 

0.0 0.2 0.4 0.6 O.S 1.0 0.0 0.2 0.4 0.6 O.B 1.0 0.0 0.2 0.4 0.6 0.8 1.0 



Coefficient ; 1123(11) Caeff icient: p , ( <J ) CoeffiCient:|ij(u) 




Fig. 1. The solid lines are the true curves, the dashed lines are the estimators, and the 
dotted lines are 95% confidence bands. 



for different bandwidths and plot the obtained RMISEs against the band- 
widths in Figure 2. It is clear that the RMISE is almost when the band- 
width is less than 0.3, and it never goes beyond 0.25, which suggests the loss 
is significant. Similarly, we define the RMISE for /3(-) and plot the RMISEs 
against different bandwidths. It again shows the loss incurred on the esti- 
mation for /3(-) due to ignoring the structure of aj(-), and i = 1, . .. ,m, is 
still significant. 
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tw*w}1ti ft pic ngmi gf rcnagm cfltfl 



Fig. 2. TTie ie/£ panel is the plot of the RMISEs of a(-) against bandwidths, the middle 
panel is the plot of the RMISEs of /3(-), and the right panel is the density function of the 
estimated random effects in real data analysis. 



6. Real data analysis. The data we study come from the Bangladesh 
Demographic and Health Survey (BDHS) of 1996-1997 (Mitra et al. [22]), 
which is a cross-sectional, nationally representative survey of ever-married 
women aged between 10 and 49. The analysis is based on a sample of 8189 
women nested within 296 primary sampling units, or clusters, with sample 
sizes ranging from 16 to 58. We allow the hierarchical structure of the data by 
fitting a two-level model, with women at level 1 nested within clusters at level 
2. A further hierarchical level is the administrative division. Bangladesh is 
divided into six administrative divisions: Barisal, Chittagong, Dhaka, Kulna, 
Rajshahi and Sylhet. Effects at this level are represented in the model by 
fixed effects, since there are only six divisions. 

The dependent variable yij is the duration in months between the first 
birth and the second birth for the jth woman in the ith cluster. We consider 
several covariates that are commonly found to be associated with fertility 
in Bangladesh. The selected individual-level categorical covariates include 
the woman's level of education (none coded by and primary or secondary 
plus coded by 1) denoted by x%j\i religion (Hindu coded by 1 and Muslim or 
other coded by 0) denoted by Xij2, first child (boy coded by 1) denoted by 
Xijs. Xij = (xiji, Xij2, Xijs) T . Another individual-level covariate is the year of 
marriage (JJij). We also consider two cluster-level variables, administrative 
division and type of region of residence (rural coded by 1 and urban coded 
by 0). We take urban as the reference, and the differences between urban 
and rural clusters are modeled by dummy variables zn. We take Barisal as 
the reference, and the differences among the six administrative divisions are 
modeled by a set of dummy variables zu, I = 2, ... ,6. 

The proposed model (1.4) is used to fit the data, and the proposed es- 
timation procedure is employed to estimate ocj{-) = (ayi(-), aj2{-), ajs(-)) T , 
j = 0, . . . , 6 and (3(-) = (/?i(-)j • ■ • , /3q(-)) t . The kernel involved in the estima- 
tion is taken to be Epanechnikov kernel, and the bandwidth is chosen to be 
35% of the range of J7«. 
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Table 3 

The P-values of the coefficients being constant 



Coefficient 




aoi(-) 


"02(0 


Q!O3(0 


an(-) 


«12(') 


P-value 


0.102 


0.005 


0.008 


0.248 


0.094 


0.217 


Coefficient 


ai3(0 


CC2l(-) 


0122 (•) 


«23(-) 


Q!3l(-) 


Q32(') 


P-vaiue 


0.082 


0.074 


0.213 


0.060 


0.618 


0.038 


Coefficient 




«4i(-) 


a42(-) 


0:43 (•) 


«5l(0 


Q52(-) 


P-vaiue 


0.283 


0.235 


0.031 


0.135 


0.263 


0.020 


Coefficient 


«53(0 


aei(-) 


"62(-) 


«63(-) 




/&(•) 


P-vaiue 


0.052 


0.052 


0.148 


0.269 


0.372 


0.174 


Coefficient 


&(•) 


/&(•) 


/MO 


#>(•) 






P-value 


0.069 


0.035 


0.023 


0.073 







First, we examine how strong the random effect on each cluster is. To 
this end, for each cluster, we estimate the random effect of this cluster. The 
density function of the norm of the random effects of all clusters is presented 
in Figure 2. It is easy to see that the random effects are close to zero. This 
indicates that the within-cluster correlation has been mainly accounted by 
the deterministic cluster effect in (1.4). 

As we mentioned before, if a coefficient is treated as a function when it 
is constant, we would pay a price on the variances of the resulting estima- 
tors. Thus, for each coefficient in the model, the proposed hypothesis test is 
employed to test whether it is constant or not. The P-value for each coeffi- 
cient is depicted in Table 3. Table 3 shows that ani(')> ®02(-), «32( - )> 042( - )> 
Q ; 52(")) Pi(') an d /%(■) are nonconstant, and the others are constant. From 
now on, we use a^- to denote constant function otij{-) and Pi for constant 
function /%(■). 

The former indicates the presentness of their nonlinear interactions with 
the year of marriage, while the latter shows no such interactions to be 
present. 

The proposed estimation procedure is used to estimate the constant and 
functional coefficients. The estimated constant coefficients and their stan- 
dard errors computed by the leave-one-cluster-out Jackknife are presented 
in Table 4, and the estimated functional coefficients together with their 95% 
confidence bands are presented in Figure 3. 

It is visible, from Table 4, that some constant coefficients are not signif- 
icantly apart from zero. This has some practical indications. For example, 
Q!4i (its estimated value is —0.014 with standard error 0.016) not signifi- 
cantly apart from zero indicates that the impact of eduction in the division 
of Kulna is not significantly different from that in Barisal. 

As explained in the section of introduction, the proposed modeling and 
estimation methods mainly serve for the inference for a particular cluster. 
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We are now going to use a cluster in the rural area in Chittagong to illustrate 
how the proposed method works. 

Based on the model (1.4) and the estimators of the unknown coefficients 
involved, we have the impact of education on the second birth interval in a 
rural area in Chittagong 

at(U) = &oi(U) + an + a 2 i- 
Similarly, we can get the impact of Hindu 

a2(U) = a 2(U) + di2 + a 2 2 

and the impact of first child in this area 

a 3 (U) = a 03 + ai3 + a 23 . 

The functional coefficients fii(-) and 122 (•) are presented in Figure 3. The 
second one in the bottom panel in Figure 3 is ai(-), which shows that, in ru- 
ral area of Chittagong, even the educated women still have a shorter second 
birth interval than the uneducated women in urban area in Barisal before 
1970. This indicates that administrative division and the type of region of 
residence play a very important role in fertility behavior in Bangladesh be- 
fore 1970. It is also noticeable that the second birth intervals of the educated 
women are getting longer. 

From the third one in the bottom panel in Figure 3, which is &2(*)i we can 
see that, even in rural areas in Chittagong, Hindus still have longer second 
birth intervals than Muslims even in the urban area in Barisal after 1970. 
This suggests that religion plays an important role in fertility behavior in 
Bangladesh. 

The impact a^-) of the first child being a boy in Rural in Chittagong 
does not vary with time. It is 0.128, which suggests that, even in the rural 
area of Chittagong, the women with a first child being a boy still have longer 



Table 4 
Estimated constant coefficients 



Coefficient 


ft 


«03 


an 


ai2 


ai3 


a?2i 


a?22 


Estimate 


3.530 


0.016 


-0.037 


0.105 


0.011 


0.036 


0.134 


SE 


0.004 


0.005 


0.010 


0.004 


0.003 


0.013 


0.006 


Coefficient 


«23 


Q?31 


«33 


a?4i 


«43 


CK51 


«53 


Estimate 


0.101 


-0.060 


0.042 


-0.014 


0.111 


-0.091 


0.104 


SE 


0.005 


0.006 


0.005 


0.016 


0.007 


0.009 


0.005 


Coefficient 




«62 


«63 


ft 


ft 


ft 


ft 


Estimate 


-0.048 


0.171 


-0.005 


-0.043 


-0.103 


-0.022 


-0.093 


SE 


4.149 


0.010 


0.007 


0.002 


0.004 


0.004 


0.006 



Note: SE stands for standard error of the estimator. 
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Fig. 3. The solid lines are the estimated functional coefficients, and dotted lines are the 
95% confidence bands. 



second birth intervals than the women with first child being a girl or dead 
even in the urban area of Barisal. A possible interpretation is that, like many 
developing countries, the Bangladesh culture always favors a boy. 

APPENDIX 

In this section, we will prove the asymptotic distributions of the proposed 
estimators. For easy description, we write 

e = (eJ,...,el l ) T , x = diag(xi, . . . ,x m ), e = (ef , . . . , e^) T , 
9(u) = (a (u) T ,ai(u) T ,...,a q (u) T ,/3{u) T ) T , 

T (u) = {e T {k+l)Aq+l) <g> I p }[a 2 Ai(u)- 1 + Ti(u)~i(u)Ti(u) T ]{e (fc+1)i{( , +1) tg> I p } 



20 



W. ZHANG, J. FAN AND Y. SUN 



and 

where s = (q + l)p + q is denned in (2.2). 

Moreover, for any function g(u) on the interval [a, b], define \\g\\oo = 
su Pue[a,6] \9{u)\, and, for any matrix A(u) = (a i3 -(«)) pxp , set 

/VP \ !/ 2 

IHU = EE IMIL • 
\i=ij=i / 

The following technical conditions are imposed to establish the asymptotic 
results: 

(1) Ee\i < oo,£'||ei|| 4 < oo, Ezj < oo, Exj d < oo for some d > 2, where 

|| e i|| 2 — e i" e i) z j denotes the jth element of Z and Xj denotes the ith 
element of X for j = 1, . . . , g,i = 1, . . . ,p; 

(2) akj(-) is continuous in a neighborhood of it, for k = 0, . . . ,q, j = 1, . . . ,p, 
where cekj(') is the jth. element of £*&(•), and assume otkj{u) ^ 0; simi- 
larly, /?;(•) is continuous in a neighborhood of u, for Z = 1, . . . ,q, where 

is the Ith. element of /3(-), and $i(u) ^ 0; 

(3) The marginal density /(•) of U has a continuous derivative in some 
neighborhood of u, and f(u) ^ 0; 

(4) Each element of Q(u) and Si (it) are continuous in the neighborhood of 
u, and Q(u) is positive definite at the point u; 

(5) The function Kit) is a symmetric density function with a compact sup- 
port; 

(6) rii, i = 1, . . . ,m, are bounded, n — ► oo, /i — > and n/t 2 — > oo; 

(7) £'||(x^Xj) _1 || 4+<5 < oo for some 5 > and i = 1, . .. ,m, where || • || denotes 
Frobenium norm of a matrix; 

(8) E[X^TiXi r riiTf r \Uii = u,Ui r = v] is continuous in the neighborhood of 
u and v respectively for r ^ I, r, I = 1, . . . , m, % = 1, . . . , m; 

(9) £l(u) is bounded and has a continuous derivative on [a, b], t(u) has a 
bounded derivative on [a, b], < ||r(it)||oo < °o and the first derivative 
of K(t) has a finite number of sign changes over its support. 

Note that fi(it) is automatically positive semidefinite at the point u, so 
the second part of condition (4) is easily satisfied. 

To obtain the proof of the theorems, the following lemmas are required. 

Lemma 1. Let {Uij} be i.i.d. random variables, be identically dis- 

tributed and £ij be independent of for i^l. Assume that the marginal 
density /(•) of U has a continuous derivative in some neighborhood of u, 
E{£\i\U\\ = u) is continuous in the neighborhood of u and E(^fi) < oo. Let 
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K{-) be a bounded positive function with a bounded support. Then, when 
nh 2 — > oo, for A = 0, 1, 2, ... , 

n ~ l E E tuih'HUii - u )) XK h(Uij - u) 

= H X f(u)E(£ 11 \U 11 = u)(l + op(l)). 

Proof. Let S n , x = n^E^iE^i&ifa" 1 ^ - u)) x K h {U i3 - - u) and 
g{v)=E{£ 11 \U n =v). Then, 

m rii 

ES n ,x = n- 1 J2Y,E{E[C ij (h- 1 (U ij -u)) X K h (U ij -u)\U ij }} 
i=ij=i 

= J g{v){h- x {v - u)) x K h {v - u)f{v) dv = fi x f(u)g(u)(l + o(l)) 

by continuity of both the density function /(•) and the conditional expecta- 
tion function g(-) in the neighborhood of u. Moreover, as K(-) is a bounded 
function with a bounded support, \u x K (u)| is bounded. Then, by the Jensen 
inequality, it follows that 

m rii 

var(S n , A ) < ESl x < n" 2 ^^^" 1 ^ ~ u^K 2 ^ - u)] 

i=l j=l 

= 0((nh 2 r 1 ). 

Therefore, 

S n ,x = ES n , x + P (y/vai(S n ,x)) = ma/(«)£(£ii I U n = u)(l + o P (l)). □ 

Lemma 2. Let {£,n, n > 1} be an independent sequence, with E£ n = 
and 

XE\^fexp(X\^\)<E^ 

for any j > 1 and some A > 0. IfY^i=i E£i ~^ 00 > ^en, on a possibly enlarged 
probability space, there exists a sequence of independent random variables 
{^,,n > l},r? n ~ -^(0, var(£ n )) suc/i i/iai 

n n 

E&~E^ 

i=l i=l 
where C is a positive constant. 



<^iog(|>«.)) 



See Lin and Lu [20], page 129, Theorem 2.6.3. 



22 W. ZHANG, J. FAN AND Y. SUN 

Proof of Theorem 1. It can be shown that 
\Jnhf(u)(a k (u) - a k (u)) 

= v /n/ l /( U )(ef fc+1)j((?+1) «)I p ,0 ? , x{g+s) )(X T ^X)- 1 X T W£ 

+ v /n/ l /(n)(ef fc+1)i((?+1) ®/ p ,O p><((?+s) )(X T ^X)- 1 X T ^xe 

+ y / n/ l /(n){(ef fc+1)i(g+1) ®/ p ,0 px(9+s) )(X T ^X)- 1 X T ^(y|P) 

- a k (u)} 

as 

Art = n(ef k+lUq+1) ® J p , O px((?+s) )(X T WX)- 1 J ff v /n- 1 / l /( U ){F- 1 X T iy£} 
with 

^-'ft/H^ff^X^e} = 2sx i 

and 

n _1 /t/(tt) coviH-VWe} = a 2 f{u)n' 1 hE{H~ 1 X T W 2 XH- 1 } 

= a 2 f 2 (u)^ °)®n(u)(l + o(l)). 
Moreover, it follows from Lemma 1 that 

(A.l) i(X T WX) = /(«)#{ (J ^)®n(u)}H(l + 0p (l)). 
Hence, 

n(eJ k+lUq+1) ® / P ,O px( , +s) )(X T WX)^ff 

= y^y((effc+i),(g+i) ®^,Opx g )^(u)" 1 ,O pxs )(l + op(l)). 

By conditions (1), (5) and (6), Lindeb erg-Feller Theorem, Slutsky's theorem 
and inverse of block matrix, it follows that 

L nl N p (O pxl ,u a 2 {ef k+1) {g+1) ® J p }Ai(u)- 1 {e (fc+1)i( g + i ) <g> J p }). 
Similarly, it can be shown that 

^n2 -^A r p(O p xi,^o{ef fc+1)i(g+1) ®/p}©i(«){e( fc+ i),(g + i) O J p }). 

Since L n i and L n 2 are independent, + L n 2) has the asymptotic distri- 
bution 

A r p(O p xi,^o{ef fe+1)i{(?+1) ® I p }[o 2 ki{u)- 1 + ©i(«)]{e( fc+ i) ) (,+i) ® 7 P }). 



SEMIPARAMETRIC MODEL FOR CLUSTER DATA 23 

By similar arguments as establishment of Lemma 1 and condition (2), we 
find that 

'e( u y 



X 1 WE{Y\V) - X 1 WX 

(A.2) 



e(u) 



-nh 2 fi2f{u)£l(u) 



2 , w . www 0(u)(l + o P (l)). 



This, together with (A.l), we get that 

/i 2 \/nh b f{u) 
L n 3 = 2 a k(u){l + Op{±)). 

Therefore, when nh 5 is bounded, we obtain that 



\Jnhf(u)\ a k (u) - a k (u) - h 2 ^-a k (u) 



Np(O pX i, 

M e Jk+i),(q+i) ®/ p }[<t 2 Ai(u) _1 + e 1 (u)]{e ik+1))(q+1) ® I p }). 

□ 

Proof of Theorem 2. It follows immediately from the proof of The- 
orem 1. □ 

Proof of Theorem 3. Let Ar-j = r* — rj, Qi = I ni — Pi and Qi be a 
diagonal matrix generated from the diagonal elements of Qi. Now, 

^ m -y m 

o 2 = V ef (Qi - Qi)ei + V efQiSi 

n — mp ~— J n — mp ~^ 

i m 

+ ££[Arf \V]QiE[Ari\V] 

n — mp ~ 

1 m 

+ Ei Ar * " ^[Ar i |X>]} T g i {Ar i - E[An\V]} 

n — mp r— j 

2 m 

(A.3) + E{ Ar * " E[An\V]} T QiE[Ari\V] 

n — mp ~^ 

r) m 

+ ^TefQ^Ar^] 

n — mp ~^ 

r) m 

+ V ef Q.IAr, - ^[Ar.lP]} 

n — mp ~— ^ 
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As Qi is an idempotent matrix and all the diagonal components of Qi — Qi 
are equal to zero, by straightforward calculation, it follows that 

2 m 

E(J nl \V) = V tr(Qi - Qi) = 0, 

n — mp ^— ' 



i=l 

9 m. 

cr 

T7.n 

t=l 



E(J n2 \V) = V tr(Qi) = a 2 , 

n — mp ' 



2cr 4 m 

cov(J nl , J n2 |P) = E(J ni J n2 \V) = - Vtr((Qi - Qi)Qi) = 

(n — mp) z ~^ 

and 

n — mp ^ n — mp J 



var(J n2 ) = ^(J^|D)} - a 4 = ^£^[1 - Xf^f^X^. 

As J n i = (n-mp) -1 E^i{Er=iE"ii ir .^^7( x ? x i) _1 ^ir£ii£ir} is a sum 
of independent variables, by Lindeberg-Feller Theorem, it follows that 

n 1/2 J nl ^A^(0,2(j 4 ci(ci-l-7))- 

Similarly, 

n 1/2 (J n2 - <J 2 ) A^(0,var(e 2 1 ) Cl (2 - ci +7)). 
Since the two terms are uncorrelated, we have that 
n 1/2 (Jm + Jn2-<? 2 ) 

(A.4) 

jV(0,2a 4 ci( Cl - 1 -7) +var(e 2 1 )c 1 (2 - ci +7)). 

In the following, we will show that the remaining parts from J„3 to J n j 
in (A.3) satisfy n 1 / 2 J ni = o P (l), I = 3, . . . , 7. 

By the conditional bias of and law of large numbers, it follows from 
nh 8 -> that 

(A.5) n 1 / 2 J n3 = o P (l). 

Since < 4> T Qicf) < </> r </> for any i and rij dimensional vector tp, we have 

m m 

E{ I J n4 1 1 V} < (n - mp) ~ 1 £ £ rg cov (0 ( ^ ) | V) r y = O p ( (nh) ~ 1 ) . 
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By the conditional bias and covariance matrix of 0(u), it follows that 

E{\J n5 \\V} 

_ h 2 fi2(l+Op(l)) 

(n — mp) 

{m rii—p m m 
EEEEi«* r '*)i 
i=l fc=l j=l r=l 

x E[\Tl{6(U l3 ) - E\e{U i3 )\V])\\V]^ 

h 2 ii 2 {l + op{l)) 
(n — mp) 

( rii rii \ 1/2 

Y,(m-p){ E( r P(^)) 2 E r 5 cov (^(^)i^) r ^ 



< 



in 

x 

i=l {r=l j=l 



Opdn-'h 3 ) 1 / 2 ). 



where 



and 



n i fir 

E] QirlQivl = $rv = | r ^ V 



E{j2 ^ v) ^ ^=^EE^(^)] 2 (i+°p(i))- 

By a similar expression as (A.l) and straightforward calculation, we have 
T^ie^) - E(e{U l3 )\V)]e ir 

= -¥ijr^ T l^ u ^ 1 J2J2 T ti £ tiK h (u tl - ua) U.(i+ p(i)). 

n J{vij) \t=ll=l J 

Moreover, by boundness of the kernel function and independence of the 
random errors, it can be shown that E{\ J n 7||f } = Op({nh)~ l ). 
Therefore, using Markov inequality, when h — > 0,nh 2 -too we get 

(A.6) n l l 2 J nl = o P (l), I = 4,..., 7. 

Combing the results from (A. 3) to (A.6), we have 

n 1/2 {a 2 - a 2 } N(0, 2cr 4 ci (ci - 1 - 7) + var(e? 1 )ci (2 - c x + 7)). □ 
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Proof of Theorem 4. Using standard arguments as in the proof of 
Theorem 3 and the law of large numbers, when nh 2 — > oo, the conditional 
bias of X is 



bias{vec(E)|P} = P (h 4 ) + o P (n 



-1/21 



and, by straightforward but tedious calculation and Lindeberg-Feller The- 
orem, when nh s — ► 0, it follows that 



n~' ~ vec^ — 2jJ - D 

where 



n 1 ' 2 vec(S - S) 7V p2 (O p2xl , (l/c 2 +p)A) 



A = £7{(eief ) ® (eief )} - vec(S) vec(S) T + 2cr 4 A 2 
+ cr 2 {5] ® Ai + Ai <g) S + A 3 } 
+ [varfe?].) -2cr 4 ]{A 4 - c 2 [2vec(Ai) vec(Ai) T 

- vec(Ai)vec(A 2 ) T - vec(A 2 ) vec(Ai) T ]} 
+ {2a 4 ci(ci - 1 - 7) + vax(ef 1 )ci(2 - a + 7)} vec(Ai) vec(Ai) T . 
Therefore, we have 
?i 1/2 vech(£-£) 

^+i)/2(°b(P+i)/2)xi, (1/C2 +p)(RTR P r 1 Rp&Rp(RpR P r 1 )- 

□ 

Proof of Theorem 5. By simple calculation, it can be seen that 
Ckj — Ckj 

m rii 



III Hi 

~ e l P +j,2s Otfu) W(ii) X(«) ) " X J J} TV(fl) e 

ri i=l z=i 

+ - £ £ e fc P +j,2s (Xj W (iI) X (i0 ) - 1 X J W (fl) xe 
n i=i 1=1 

^ — ^ ^ — ^ T ( T \ 1 r J^ 

+ ~ 2^ 2^ e fep+i,2 S ( x (a)W(ioX( a) ) X (a} W(«) 
n i=i i=i 



= T n \ + T n2 + T„3. 
First, we obtain that E(T n \) = 0, and, for any j, let 
F% = el p+j!2s (Xf il) W^X^)~ 1 Xf a) W w , l = l,...,m,i = l,...,m, 
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and F = (F n , . . . , F ini , . . . , F ml , . . . , F mnm ) T . Then, we have 



a 



2 



vav{T nl \V} = — l^FF 1 l r 

2 m rii m n r 



2=1 1=1 r=l V=l 



x X (i0 W( r „) X( r „) (X( ru ) W(r V ) X( r „) , 
x e kp+j,2s 



a 



2 



el p+j!i{q+1)p) E{Ai(U) 1 }e fep+ii ( ((?+1) p ) (l + o P (l)), 



n -fcp+i,(( 9 +i) P ) J 

by Lemma 1, straightforward but tedious calculation and the law of large 
numbers. Therefore, using conditions (1), (5) and (6) and the Lindeberg- 
Feller Theorem, it follows that 

VnT nl —> N(0,a 2 el p+j ^ q+1 ^E{A 1 (U)~ 1 }e kp+j: (( q+1 ) p )). 

By the same way, it can be shown that 

VnT n2 -^7V(0,eJ, +ii((9+1)p) {£;[ei({7)] + E 2 }e kp+j ^ q+1)p) ). 

Moreover, combining the results similar to (A.l) and (A. 2), we get 

,2^2 



Tn3 = h l 2 



m rii 



i=l 1=1 



Therefore, by independence of T n \ and T n 2, when nh 4 — ► 0, we have 
Vn{Ckj — Ckj} 

N&e^^^ElA^U)- 1 } +E[@ 1 (U)} + E 2 }e kp+Mq+1)p) ). 



D 

a 



Proof of Theorem 6. Obviously, 

a kj {u)-a kj {u)-hms{a kj {u)\V} = e^ +ii2s (X T WX)- 1 X r W(e+xe) = h{u) 

First of all, we approximate the random matrix I\{u). As /(•) and f2(-) 
have continuous derivatives on [a, b] , by similar arguments as Lemma 1 and 
Neumann series expansion, it follows that 

(A.7) nH(yL T WK)- l H = /(tt)- 1 ^) -1 + P ((n/i 2 )" 1 / 2 + h) 
uniformly for u € [a, b] where S(u) = ^ j (8) 
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By the asymptotic normality of \fn 1 hf(u)H 1 X T Ty(e+xe) in the proof 
of Theorem 1, we get that 

1 



(A.8) 



-H-Vw(e + xe) 

n 



Op 



Therefore, using (A. 7) and (A.8), it can be seen that 

h(u) - iei p+ , 2s /(«)- 1 5(u)- 1 ^- 1 X T W( £ + xe) 



(A.9) 



= Op{{nh z ' 2 y l + {nil" 1 )- 1 ' 2 ). 
Next, we consider 

h(u) = -e T kp+h2 J(uy l S(uy l H~ l K T W{e + Ke) 



n 

m rn 



in i^i -i 

E E -^4 P+j , s V(u)~ lT * K h(Uii - u){e u + X$ei). 
»=i i=i n J\ u ) 



Let wu = el p+j Jl{u) l Yu. Then, 
{nhf{u)T{uy l ^ l } l / 2 I 2 {u) 



m ni 



(A.10) 



{nhf( U )r( U y r 1/2 J2J2 K 
i=i i=i 



Un - u 



wu(eu + Xf t e 



^{nhf(u)r{u)u y l / 2 h{u). 

Divide interval [a, b] into n subintervals J r = [d r -i,d r ),r = 1,2,..., 
n — 1, J n = [d n -i,b] where d r = a + ^r^r. Define Uu = d r I(Uu & J r ),r = 
1, ... ,n, and it is obvious that \Uu — Uu\ = 0(n~ 1 ). Then, by law of large 
numbers for random sequence {wn{en + XTe^)}, it follows that 



4M = EE 



r K fU a -u 



(A.ll) 



t=i ;=i 

+EE^ 

i=l 1=1 



h 

Ug - u 
h 

m rii 

0p(O+EE* 

i=l Z=l 



if 



h 



wu(eu + X^ei) 



wu(eu + Xj^i) 
Uu - u 



h 



wu(eu + Xje*) 



= P (/i- 1 )+/ 4 (n) 
uniformly for u G [a, 6]. By the definition of Uu, we have that 

'j \ m rii 

a r — U x 



r=l 



h 



E E G Jr)wu{eu + Xj ei 
i=i i=i 
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Let Ct = E*=i E£i Er=i ^* G JrW(e«+^ei) = E£i £?=i < U a < 
dt)wu(eu + X^ej), Co = 0. Then, by Lemma 2, for any t = 1, . . . , n and u £ 
[a, b] , we get 

n) a.s., 

where W(-) is a Wiener process and 
G(c) = I" <J 2 [E(wl l \U ll = v)+E(X^X ll w 2 ll \U ll = v)]f(v)dv 

J a 

C m rii rii 



+ 



n *EE E E[wiiXl l Y 1 Xi r Wi r \Uii=vi,Ui r = v 2 ) 



x f{vi)f{v2)dvidv 2 - 
It follows, from Abel's transform, that 



h(u)=K 



b — u 
h 



n-l 



r=l 



K 



d r _|_i — u 



d r — u 
h 



Cr 



and 



n-l r 

E 

r=l 



A" 



d r +i — u 



K 



d r — u 
h 



[Cr ~ U^WiGidr))] 



< 



n-l 



max |C r -n 1 / 2 W(G(d r ))|E 



l<r<n 

Op(n x / 4 log n) 



r=l 



A' 



d r +i — u 



A 



d r — U 



Hence, 



I 4 (n) = n 1 / 2 A(^)^(G(6)) 



(A.12) 



n-l 
"E 



- tiV2 N ^ K 

r=l 

+ P (n 1 / 4 log 



d r +i — it 



A 



d r — U 



W(G(d r )) 



uniformly for u E [a, 6]. 

For a Wiener process, it is known that (Csorgo and Revesz [5], page 44) 

sup \W(G(t + 5)) - W(G(t))\ = Oddhgil/d)} 1 / 2 ) a.s., 

te[a,b] 
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when 5 is any small number. Using this property and the boundness of K(-), 
we have 



n-l 

E 

r=l 



K 



r+1 ~ u 



K 



d r — u 



h 



W(G(v))dK 



W{G{d r )) 
^ + Op ({n' 1 log n}V2 



h 



uniformly for u G [a, 6]. Together with (A. 11) and (A. 12), it follows that 
{nhy l ' 2 h{u) - h' 1 ' 2 J*K(^^j dW(G(v)) 



(A.13) 
Let 



P {{nh 2 Y 1 ' A \ogn+ (n/i 3 )" 1 / 2 ). 



and 



Y ln (u) = h~ 1 / 2 f K 

J a 

Y 2n {u) = h-V 2 J\( 

Y 3n (u) = h~ 1 / 2 £k( 



dW(G(v)), 

[r{v)f(v)} 1/2 dW{v-a) 



v — u 



h 



dW(v-a). 



For a Gaussian process, following the similar proof of lemmas in Hardle [13], 
we have 



(A.14) 



\\Yin{u)-Y 2n (u)\\ 00 = P (h 1 / 2 ), 
\\(f{u)T{u)y l ' 2 Y 2n {u)-YM\\ 00 =Op(h l l 2 ). 
Therefore, by (A.13) and (A.14), 

||{n/ l /(n)r(n)}-V 2 /3(n)-y 3 nHI|oo=Op(^/ l 2 )- 1 / 4 logn+K 3 )-V2 + / l 1/2 ). 

From Theorem 2 and Theorem 3.1 of Bickel and Rosenblatt [1], when 
h = n~ p , l/5<p<l/3, we have 

P((-21og{/i/(& - a)}) 1 / 2 {y- 1 /2|| {nV(u)r(u)} -i/2 jr3(u) || oo _ Wn} < x) 

— > exp{— 2exp{— x}}, 

where oj n is defined in Theorem 6, as 

(A.15) vzr{a kj (u)\V} = {nh/(u)r(u)- V 1 } -1 ^ + o P (l)) 
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uniformly for u £ [a, b] by straightforward calculation and similar arguments 
as Lemma 1. Using the same proof of the first part of Theorem 2 of Fan and 
Zhang [10], the result of Theorem 6 is easily obtained. □ 



Proof of Theorem 7. To prove the theorem, we first derive the rate 
of convergence for the bias and variance estimators. By (A. 7) and its similar 
arguments, we have 

\\&(a kj (u)\V) - biaB(a fci («)|2?)||oo = P (h 2 {n^/ 7 + o(h)}), 
where the rate n _1//7 comes from the pilot estimation of O(-) and the term 

" (3) 

o(h) comes from the coefficient in front of (•). 
Furthermore, by similar proof to Lemma 1, we get 



h 



n 



H~ 1 X T W 2 XH~ 1 - f(u)S(u) 



o P (l) 



where 5(u) = ( v ° °J ® Sl(u) and 



- H~ 1 X T Wxx T W XH~ 1 

n 



P (1) 



These results, together with (A. 7) and the results of Theorem 3 and 4, 
give us 



var 



(d fcj (^)|P) -var(a fci (n)|P)||oo = P ((n/ l )- 1 {"" 1/2 + K 8 )" 1/2 })- 



Using (A. 15) and the same proof of the second part of Theorem 2 of Fan 
and Zhang [10], the result of Theorem 7 is obtained. □ 
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